Data Description

After you graduated, you started to work for one of the best firms in the country. You were hired because you have Data Analysis skills with \(R\). During your first week your manager comes to your office and gives you the following data set and ask you to “analyze the hell out of this data” (his words. not mine). Mainly he wants you to build a linear model to predict executive salaries. But you know you can do much more! Analyze the given data and create a report like your job depends on this.

Data stored as a .txt file under week 10. Data consists of 11 variables.

Load Data into R

id Y X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 11.4436 12 15 1 240 170 1 44 5 0 21
2 11.7753 25 14 1 510 160 1 53 9 0 28
3 11.3874 20 14 0 370 170 1 56 5 0 26
4 11.2172 3 19 1 170 170 1 26 9 0 24
5 11.6553 19 12 1 520 150 1 43 7 0 27
6 11.1619 14 13 0 420 160 1 53 9 0 27
Variable Name Description
y1 salary Salary of executive
x1 experience Experience(in years)
x2 education Education (in years)
x3 gender Gender (1 if male 0 if female)
x4 emps_sump Number of employees supervised
x5 assets Corporate assets (in millions of USD)
x6 board_mb Board member (1 if yes, 0 if no)
x7 age Age (in years)
x8 profit Company profits (in millions of USD)
x9 int_res Has international responsibility (1 if yes, 0 if no)
x10 sales Company’s total sales (in millions of USD)

Read Table

Rename Columns

Conduct EDA

Looking at Raw Values

salary experience education gender emps_sup assets board_mb age profit int_res sales
11.4436 12 15 1 240 170 1 44 5 0 21
11.7753 25 14 1 510 160 1 53 9 0 28
11.3874 20 14 0 370 170 1 56 5 0 26
11.2172 3 19 1 170 170 1 26 9 0 24
11.6553 19 12 1 520 150 1 43 7 0 27
11.1619 14 13 0 420 160 1 53 9 0 27
  • In console

    • library(ISLR)

    • View(df)

    • ?df

Computing Summary Statistics

Skim summary statistics
 n obs: 100 
 n variables: 11 

── Variable type:factor ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 variable missing complete   n n_unique          top_counts ordered
 board_mb       0      100 100        2 0: 51, 1: 49, NA: 0   FALSE
   gender       0      100 100        2 1: 66, 0: 34, NA: 0   FALSE
  int_res       0      100 100        2 0: 82, 1: 18, NA: 0   FALSE

── Variable type:integer ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   variable missing complete   n   mean     sd  p0    p25   p50    p75
        age       0      100 100  42.84   9.07  23  37     42.5  49.25
     assets       0      100 100 175.1   15.41 150 160    180   190   
  education       0      100 100  16.02   2.3   12  14     16    18   
   emps_sup       0      100 100 340.1  167.18  60 187.5  360   492.5 
 experience       0      100 100  13.08   7.34   1   7.75  13    20   
     profit       0      100 100   7.7    1.55   5   6      8     9   
      sales       0      100 100  24.83   2.74  20  23     25    27   
 p100     hist
   64 ▃▃▇▇▆▆▃▂
  200 ▃▇▁▆▇▁▇▃
   20 ▇▃▅▅▆▆▆▁
  600 ▇▆▅▆▇▆▇▇
   26 ▇▃▆▇▃▃▇▅
   10 ▂▇▁▇▆▁▇▆
   30 ▃▃▃▇▂▃▂▃

── Variable type:numeric ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 variable missing complete   n  mean   sd    p0   p25   p50   p75  p100
   salary       0      100 100 11.46 0.26 10.66 11.28 11.46 11.61 12.06
     hist
 ▁▁▃▇▇▇▃▂

Description of Summary Statistics

  • The minimum value of age = 23, assets = 150, education = 12, emps_sup = 60, experience = 1, profit = 5, sales = 20, and salary = 12.06.

  • The maximum value of age = 64, assets = 200, education = 20, emps_sup = 600, experience = 26, profit = 10, sales = 30, salary = 10.66.

  • The mean value of age = 42.84, assets = 175.1, education = 16.02, emps_sup = 340.1, experience = 13.08, profit = 7.7, sales = 24.83, salary = 11.46.

  • The standard deviation of age = 9.07, assets = 15.41, education = 2.3, emps_sup = 167.18, experience = 7.34, profit = 1.55, sales = 2.74, salary = 0.26.

  • From these histograms we can see that;

# A tibble: 8 x 2
  Variable   Distribution  
  <chr>      <chr>         
1 age        Normal        
2 assets     Random        
3 education  Mostly Uniform
4 emps_sup   Mostly Uniform
5 experience Random        
6 profit     Random        
7 sales      Skewed Right  
8 salary     Skewed Left   
* This means the number of people with an age between 33.77and 51.91 is larger than the number of people of ages outside this range.

* This means the value of assets is random across the population.

* This means the number of people with any number of years of education is evenly distributed.

* This means the number of employees supervised is evenly distributed.

* This means the number of years of experience is random across the population.

* This means the value of profit is random across the population.

* This means the company's total sales is \$25 million or below.
  
* This means the executive salary is \$11.46 million or above.

Creating Data Visualizations

Visual Descriptions:

  • The distribution for males (1) is higher than the distribution for females (0).

  • The distribution for board members (1) and non-board members (0) are approximately the same. The distribution for non-board members is slightly higher than board members.

  • The distribution for people that do not have international responsibility (0) is significantly higher than people who do have international responsibility (1).

  • The distribution for people with 20 years of education is significantly lower than the distribution for people with 12.5 years to less than 20 years of education.

  • The greatest distribution for age is between 30 years and 45 years.

  • The distribution of people 60 years of age and older are the lowest compared to the distribution of people between the ages of 30 and 45 years of age.

  • The mean salary for males (1) is higher than the mean salary for females (0).

  • There is a positive linear relationship between a person’s experience (in years) and their salary.

  • There is a positive linear relationship between a person’s age and their salary.

  • The mean salaries for people with international responsibility (1) and with no international responsibility (0) are approximately even.

Creating Linear Models

Experience, Education, Gender, and Assets: Parallel Slopes Model

Using a linear model with parallel slopes, we can predict an executive’s salary (in millions) based on their experience, education, gender, and assets.

experience, education, gender, and assets all have significant positive correlation to salary that will be included in our linear model.

\(\hat{Salary} = 10.14 + 0.027 \cdot experience + 0.022 \cdot education + 0.003 \cdot assets + 0.185 \cdot 1_{Male}(x)\)

Male executive model: \(\hat{Salary} = 10.325 + 0.027 \cdot experience + 0.022 \cdot education + 0.003 \cdot assets\)

Female executive model: \(\hat{Salary} = 10.14 + 0.027 \cdot experience + 0.022 \cdot education + 0.003 \cdot assets\)

In our base model, we could extrapolate that executives have a salary of $10.14 million assuming they have no experience and no education, With every extra year of experience and education, one could expect their salary to increase by $27,000 and $22,000 million respectively. Male exexcutives, on average, make $185,000 more than their female counterparts with similar experience, education, and assets.

Experience & Gender: Interaction Model

Using an interaction model, we can use both the experience and gender variables to see how they interact with each other in terms of salary.

\(\hat{score} = 11 + 0.026 \cdot experience + 0.174 \cdot 1_{Male}(x) + 0.002 \cdot experience \cdot 1_{Male}(x)\)

Female experience model: \(\hat{score}_F = 11 + 0.026 \cdot experience\)

Male experience model: > \(\hat{score}_M = 11.174 + 0.028 \cdot experience\)

As we can see from the models, male executives have both higher base salaries than women in addition to marginally higher increase in salaries with an increase in experience. However, as evidenced from the graph, this interaction between experience and gender is negligible, as both genders encounter an increase in pay at the same rate.

Experience & Education: Another Linear Model